Papers and Reading :
- Mikolov et al Word2vec
- 1402.3722v1 arxiv
- stack overflow why does word2vec use 2 representations for each word
- https://arxiv.org/pdf/1310.4546.pdf
- Paragraph2Vec Mikolov ICML 2014
- MRNet-Product2Vec ECML-PKDD 2017.
- Crowd Sourcing Howe book.
- Raykar et al JMLR 2010 EM Algo
- Missing labels Raykar et al JMR 2010
- Modeling task complexity Welinder et al 2010 and whitehill et al NIPS 2010.
- Sequential Crowdsourced labeling as MDP raykar et al
- Deep structured semantic embedding, Microsoft Research
- Visw2v CVPR 2016
- 1602.05568 arxiv
- 1801.03244, ICLR 2018
- iWGAN Guljarani et al
- EcommerceGAN
-
T-SNE : To get data from high dimension to low dimension for visualizing.
For crowdsourced labeling : CrowdFlower, AmazonMechanicalTurk
There's an inherent challenge for word2vec, if you have 10000 vocabulary size and your embedding is 300 dimensions then you have 3 million weights. Can use negative sampling to overcome this.
Limitations :
Open Problems :
In [ ]: